ar X iv : c m p - lg / 9 60 50 14 v 1 1 2 M ay 1 99 6 Clustering Words with the MDL Principle

نویسنده

  • Hang Li
چکیده

We address the problem of automatically constructing a thesaurus by clustering words based on corpus data. We view this problem as that of estimating a joint distribution over the Cartesian product of a partition of a set of nouns and a partition of a set of verbs, and propose a learning algorithm based on the Minimum Description Length (MDL) Principle for such estimation. We empirically compared the performance of our method based on the MDL Principle against the Maximum Likelihood Esti-mator in word clustering, and found that the former outperforms the latter. We also evaluated the method by conducting pp-attachment disambiguation experiments using an automatically constructed thesaurus. Our experimental results indicate that such a thesaurus can be used to improve accuracy in disam-biguation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : c m p - lg / 9 60 50 18 v 1 1 3 M ay 1 99 6 Efficient Tabular LR Parsing

We give a new treatment of tabular LR parsing, which is an alternative to Tomita’s generalized LR algorithm. The advantage is twofold. Firstly, our treatment is conceptually more attractive because it uses simpler concepts, such as grammar transformations and standard tabulation techniques also know as chart parsing. Secondly, the static and dynamic complexity of parsing, both in space and time...

متن کامل

ar X iv : q - a lg / 9 70 50 12 v 1 1 6 M ay 1 99 7 Poisson structures on the center

It is shown that the elliptic algebra Aq,p(ŝl(2)c) has a non trivial center at the critical level c = −2, generalizing the result of Reshetikhin and Semenov-Tian-Shansky for trigonometric algebras. A family of Poisson structures indexed by a non-negative integer k is constructed on this center.

متن کامل

ar X iv : q - a lg / 9 60 50 33 v 1 2 1 M ay 1 99 6 CRM - 2278 March 1995 q - Ultraspherical Polynomials for q a Root of Unity

Properties of the q-ultraspherical polynomials for q being a primitive root of unity are derived using a formalism of the soq(3) algebra. The orthogonality condition for these polynomials provides a new class of trigonometric identities representing discrete finite-dimensional analogs of q-beta integrals of Ramanujan. Mathematics Subject Classifications (1991). 17B37, 33D80

متن کامل

ar X iv : q - a lg / 9 60 50 08 v 1 5 M ay 1 99 6 QUANTUM PRINCIPAL BUNDLES & THEIR CHARACTERISTIC CLASSES

A general theory of characteristic classes of quantum principal bundles is sketched, incorporating basic ideas of classical Weil theory into the conceptual framework of non-commutative differential geometry. A purely cohomological interpretation of the Weil homomorphism is given, together with a geometrical interpretation via quantum invariant polynomials. A natural spectral sequence is describ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996